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The entropy per coordinate of a random vector is 
highly constrained under convexity conditions 
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Abstract — The entropy per coordinate in a log-concave random 
vector of any dimension witli given density at the mode is 
shown to have a range of just 1. Uniform distributions on 
convex bodies are at the lower end of this range, the distribution 
with i.i.d. exponentially distributed coordinates is at the upper 
end, and the normal is exactly in the middle. Thus in terms 
of the amount of randomness as measured by entropy per 
coordinate, any log-concave random vector of any dimension 
contains randomness that differs from that in the normal random 
variable with the same maximal density value by at most 1/2. 
As applications, we obtain an information-theoretic formulation 
of the famous hyperplane conjecture in convex geometry, en- 
tropy bounds for certain infinitely divisible distributions, and 
quantitative estimates for the behavior of the density at the 
mode on convolution. More generally, one may consider so-called 
convex or hyperbolic probability measures on Euclidean spaces; 
we give new constraints on entropy per coordinate for this class 
of measures, which generalize our results under the log-concavity 
assumption, expose the extremal role of multivariate Pareto-type 
distributions, and give some applications. 

Index Terms — Maximum entropy; log-concave; slicing prob- 
lem; inequalities; convex measures. 

I. Introduction 

probability density function (or simply "density") / 
. defined on the linear space M" is said to be log-concave 



if 



.f{ax + (1 - a)y) > .fixTfiyf 



(1) 



for each x,y E M" and each < a < 1. If / is log-concave, 
we will also use the adjective "log-concave" for a random 
variable X distributed according to /, and for the probability 
measure induced by it. (For discussion of the justification for 
such terminology, see the beginning of Section VI.) Given a 
random vector X = {Xi, . . . ,Xn) in R" with density f{x), 
introduce the entropy functional 

Hf)^- I f (x) log f{x)dx, 



provided that the integral exists in the Lebesgue sense; as 
usual, we also denote this h{X). Our main contribution in 
this paper is the observation that when viewed appropriately, 
every log-concave random vector has approximately the same 
entropy per coordinate as a related Gaussian vector. 
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Log concavity has been deeply studied in probability, statis- 
tics, optimization and geometry, and there are a number of 
results that show that log-concave random vectors resemble 
Gaussian random vectors. For instance, several functional 
inequalities that hold for Gaussians also hold for appropriate 
subclasses of log-concave distributions (see, e.g., [5], [15], 
[-1] for discussion of Poincare and logarithmic Sobolev in- 
equalities for log-concave measures). Observe that this is 
not at all obvious at first glance- log-concave probability 
measures include a large variety of distributions including the 
uniform distribution on any compact, convex set, the (one- 
sided) exponential distribution, and of course any Gaussian. In 
this note, we give a strong (quantitative) information-theoretic 
basis to the intuition that log-concave distributions resemble 
Gaussian distributions. 

To motivate our main results, we first observe that for (one- 
dimensional) log-concave random variables X, 



hiX)^loga, 



(2) 



where <t is the standard deviation of X. (An exact result 
to this effect is contained in Proposition U.l and proved in 
Section II.) An upper bound for entropy in terms of standard 
deviation clearly follows from the maximum entropy property 
of the Gaussian; so it is the lower bound that is not obvious 
here. Thus the property (2) may be viewed as asserting 
comparability between the entropy of a one-dimensional log- 
concave density and that of a Gaussian density with the same 
standard deviation. 

Our main purpose in this note is to describe a way to capture 
the spirit of the statement (2) in the setting of (multidimen- 
sional) random vectors. To describe this extension, recall that 
the L'^ norm of a measurable function / : M" M is defined 
as its essential supremum with respect to Lebesgue measure, 
ll/lloo = ess sup^/(a;). Throughout this paper, we will write 
11/11 = ll/lloo for brevity. Any log-concave / is continuous 
and bounded on the supporting set il ~ {x : f{x) > 0}, so 
we can simply write ||/|| = maxj-^tj /(.t). 

Theorem I.l. If a random vector X in M" has a log-concave 
density f, let Z in R" be any normally distributed random 
vector with maximum density being the same as that of X. 
Then 



-HZ) - i < -h{X) < -h(Z) 
n n n 



Equality holds in the lower bound if and only if X is uniformly 
distributed on a convex set with non-empty interior Equality 
holds in the upper bound if X has coordinates that are i.i.d. 
exponentially distributed. 
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The observation that it is useful to consider Gaussian com- 
parisons by matching j|/|| rather than the first two moments 
may be considered the key observation of this paper. Theo- 
rem I.l follows easily from the following basic proposition 
(both are proved in Section IV). 

Proposition 1.2. If a random vector X in M" has density f, 
then 

-h{X) > logll/ll-i/". 
n 

If, in addition, f is log-concave, then 

-h{X)< 1 + log ll/ll-i/". 
n 

Observe that the lower bound here is trivial, since 

On the other hand, let us point out that the upper bound in 
Proposition 1.2 improves upon the naive Gaussian maximum 
entropy bound (obtained without a log-concavity assumption). 
Indeed, if the covariance matrix R of X with entries i?y = 
cov{Xi, Xj) is fixed, then h{X) is maximized for the normal 
distribution. This property leads to the upper bound 

-h{X) <C + logc7, (3) 
n 

where a = det^/"(_R) and C = log \/2iie. Now, according 
to one general comparison principle (stated in Section III), 
in the class of all probability densities, the quantity a ||/||^^" 
is minimized for the uniform distribution on ellipsoids. This 
property yields 

a>c||/iri/" (4) 

for some universal constant c > 0. Hence, modulo the constant 
C, (3) would indeed be improved if we replace a with 

While Proposition 1.2 is akeady remarkable in its own right, 
log-concavity is a relatively strong assumption, and it would 
be advantageous to loosen it. Inspired by this objective, one 
wishes to study more general classes of probability distribu- 
tions, satisfying weaker convexity conditions (in comparison 
with log-concavity). As a natural generalization, we consider 
probabiUty densities of the form 

f(x) = <^(x)-^ X e f], (5) 

where is a positive convex function on an open convex set SI 
in K". To see that this is a natural generalization, observe that 
any log-concave density is of this form for any > since 
the exponential function composed with a convex function is 
convex, and that log-concave distributions have finite moments 
of all orders, whereas densities of the form (5) can be heavy- 
tailed. For example, the Cauchy distribution on the real line 
has density /(.t) = [7r(l + x^)]^^ = Lpix)^"^ with 93 being 
convex, although it is certainly not log-concave. 

Another example, which is of significant relevance to our 
development, is the n-dimensional Pareto distribution. For 
fixed parameters f3 > n and a > 0, this has the density 

fl3.a{x) = ^=-^ -(a + XiH hXn)"'^, Xi>0, (6) 



where Z„ {(3,a) is the normalizing factor, i.e.. 




(As shown in Lemma A.l, Pareto distributions with j3 < n do 
not exist, since Zn{P,a) is finite if and only if /3 > n.) 

Theorem 1.3. If a random vector X in M" has a density f of 
the form (5) with /3 > n + 1, and if ||/|| is fixed, the entropy 
h(X) is maximal for the n-dimensional Pareto distribution. 

Since h{X) + log||/|| is an affine invariant, one may 
assume ||/|| = 1 without loss of generality. Also, put for 
definiteness a = 1 and write Z{P) = Z„(/3, 1), and Xp for 
the random vector with density 1. Then Theorem 1.3 may 
be equivalently written as 

h{X) + \og 11/11 < h{Xp)+\og 11/^,1 II, (7) 

Moreover, as shown in the Appendix, 

^ = (/3 - 1) . . . (/3 - n) (/3 - 1)„, 

where {h — 1)„ = r(6)/r(6 — n) is the n-\h falling factorial 
of 6 — 1, and (7) takes the form 

n 

h{X) + \og 11/11 < (8) 

Hence we recover Proposition 1.2 in the limit as /3 +00. 

It is convenient, for the sake of comparison with Proposi- 
tion 1.2, to write some consequences of Theorem 1.3 in the 
following form. 

Corollary 1.4. For the range (3 > /Sqti with fixed /?o > 1 (and 
still for P > n + 1), we have 

-hiX)< Cf,„+ log ll/ll-i/", 
n 

where the constant Cp^ depends on /3o only. In fact, one may 
take Cp„ = p^'Li - However, in the larger range 13 > I3q + n 
with fixed /3q > 1, 

-hiX)< log ||/||-i/« + 0(logn), 
n 

where the 0{logn) term may be explicitly bounded. 

For the range /3 < n, it is not possible to control h{X) in 
terms of ||/||. In this case h{X) + log ||/|| may be as large, 
as we wish (which can be seen on the example of the Pareto 
distribution with /? n). One explanation for this observation 
could be the fact that the measures with densities (5) for (3 <n 
may not be convex (see Remark VI. 2), or viewed another way 
that there do not exist Pareto distributions for f3 < n (see 
Lemma A.l). Thus, we still have a gap n < f3 < n + 1, when 
Theorem 1.3 is not applicable, and we cannot say whether one 
may bound h{X) in terms of ||/||. 

For ease of navigation, let us outline how this note is 
organized. In Sections II and UI, we expand on the motivation 
for considering Proposition 1.2 by proving the statements (2) 
and (4). 
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In Section IV, we prove our main results for log-concave 
probability measures. In particular. Proposition 1.2 and The- 
orem 1.1 emerge as consequences of a more general result 
that bounds the Renyi entropy of any order p > I using the 
maximum of the density. As a corollary of this, we also show 
that any two Renyi entropies become comparable for the class 
of log-concave densities. 

In Section V, we use the preceding development to give 
a new and easy-to-state entropic formulation of the famous 
sUcing or hyperplane conjecture. Indeed, the hyperplane con- 
jecture can be formulated as a multidimensional analogue of 
the property (2), different from the multidimensional analogue 
already represented by Theorem 1.1. Specifically, if D(f) 
is the "entropic distance" of / from Gaussianity (defined 
precisely later), then the property (2) may be rewritten in the 
form < D{f) < c for some constant c and every one- 
dimensional log-concave density /, whereas the hyperplane 
conjecture is shown to be equivalent to the statement that 
D{f) < cn for some universal constant c and every log- 
concave density / on R". Furthermore existing partial results 
on the slicing problem are used to deduce a universal bound 
on D{f) for all log-concave densities / on R", although the 
dominant term in this bound is jnlogn rather than linear in 
n. 

Section VI begins the study of a more general class of prob- 
ability measures, the so-called "convex probability measures". 
Sections VII and VIII are dedicated to proving Theorem 1.3 
(and Corollary 1.4); the former describes some necessary tools 
including a result on norms of convex functions, and we 
complete the proof in the latter 

Section IX develops several applications- to entropy rates 
of certain discrete-time stochastic processes under convexity 
conditions, to approximating the entropy of certain infinitely 
divisible distributions, and to giving a quantitative version of 
an inequality of Junge concerning the behavior of ||/|| on 
convolution. We end in Section X with some discussion. 

II. One-dimensional log-concave distributions 

Proposition II.l. For a one-dimensional log-concave random 
variable X with standard deviation a, 

log(Coa) < h{X) < log(Cia) 

for some positive constants Co, Ci. The optimal constant Ci = 
VSttc is achieved for the normal, and the optimal constant 
Co > 1/V2. 

Proof: The upper bound holds without the log-concavity 
assumption, and is obtained simply by using the Gaussian 
entropy. 

Since / is log-concave, it is supported on an interval (a, b) 
(where a may take the value — oo and b may take the value 
oo), and moreover, it is strictly positive on this support interval 
(being of the form e~'^ with Lp convex). If F is the cumulative 
distribution function of / restricted to (a, b), its inverse F^^ : 
{a, b) (0, 1) is well defined since the positivity of / implies 
that F strictly increases on the support interval. Now consider 
the function 

Iit) = f{F-\t)), 0<t<l. 



In [10, Proposition Al], it was shown that / is log-concave if 
and only if / is positive and concave on (0,1). Hence for all 

t e (0, 1), i/(t) < + /(I - <)] < /(i), SO that 

I{t) < 2/(1) = 2/(m). 
Taking the supremum over all t, one obtains 

max/(a;) < 2/(m). (9) 

X 

For one-dimensional log-concave densities /(.t), it was 
shown in [,_., Proposition 4.1] that 

^ < cy^fim)^ < i, (10) 
where m is the median. Combining (9) and (10) gives 

— < (7^ max f(x)'^ < 2 
12 - X ■'^ ' 

or a/V2<\\f\\-^ <VT2a. (11) 
Applying Proposition 1.2, 

/i(X)>log||/|ri>loga-ilog2, 

which is the desired lower bound. ■ 
Even in this one-dimensional setting, the best constant Cq 

and corresponding extremal situations seem to be unknown; 

these would be interesting to identify. Note that the inequalities 

in (10) are sharp and are attained for the uniform and double 

exponential distributions. 

In Section V, we discuss the possible generalization of 

Proposition II.l to general dimension n; this is related to the 

hyperplane conjecture. 

III. An extremal property of ellipsoids 

Here we recall the comparison property (4), mentioned in 
Section I, concerning an extremal property of ellipsoids. It 
goes back to the work of D. Hensley ([28], Lemma 2), who 
noticed that, if a probability density / on E" is maximized at 
the origin, then the quantity 

/(0)2/" / \x\'f{x)dx 

is minimized for the uniform distribution on the Euclidean 
balls centered at the origin. More precisely, Hensley consid- 
ered only symmetric quasi-concave probability densities /, 
and later K. Ball ([6], Lemma 6) simpUfied the argument 
and extended this observation to all measurable densities / 
satisfying f(x) < /(O) for all x. 

One may further generalize and strengthen this result, by 
applying affine transformations to the probability measures fi 
with densities /. 

Proposition III.l. Put 

L^.P= I p{\\f\\"^\x\)f{x)dx, 

where p = p(t) is a given non-decreasing function in t > 0. In 
the class of all absolutely continuous probability measures p 
on K", the functional i/j,p is minimized, when p is a uniform 
distribution on a Euclidean ball with center at the origin. 
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Proof: Since L^ p does not depend on ||/||, we may 
assume = 1. Denote by A the uniform distribution on the 
EucUdean ball 5(0, r„) with center at the origin and volume 
one (so that w„rjj = 1, where aj„ is the volume of the unit 
ball). We need to show that L^ p > L\,p. Since both L,, ^ and 
Lx p are linear with respect to p, it suffices to consider the 
case p ~ l(r,oo)' the indicator function of a half-axis. Then 
the property L^^p > L\ p reads as 



p{\x\ <r} < X{\x\ <r} = 



CL>„r", for < r < r„ 
1, for r > r„. 



This inequality is automatically fulfilled, when r > r„. In 
the other case, due to the assumption f{x) < 1 (almost 
everywhere), we have 



^^{\x\<r}^ f f{x)dx< f 

J{\x\<r} J{\: 



dx = \{\x\ < r}, 

[\x\<r} 

which is the statement. ■ 
As a corollary, we obtain the following observation. 



Corollary III.2. Let X be a random vector in M" with density 
f and non-singular covariance matrix R. If ^ dct^^"(i?), 

'^>c!i/ir^/" 

for some universal constant c > 0. 

Proof: Let us return to the basic case p{t) = t^. Thus, 
the functional L^^p ~ / |xp /(x) is minimal for 

p = X, the uniform distribution on the Euclidean ball -8(0, r„) 
with center at the origin and volume one. Hence, the same is 
true for the functionals 

- Il/lp/" / \T{x-xo)\^f{x)dx 



for any point xq and any linear map T : M" M" with 
|detT| = 1. Taking for xq the bary center or mean of p, this 
functional may be written as 



1 



- ||/|r/"trCov(rX) 

71 

where trCov(TX) denotes the trace of the covariance matrix 
of T{X). Minimizing over all T's, the above integral turns 
into 



|/||2/"(dcti?)i/", 



(12) 



where R is the covariance matrix of X. This follows from the 
classical representation (see, e.g., [9, Proposition II. 3. 20]) for 
the determinant of a positive-definite matrix C: 

(detC)^ = mill 1^^-^^ : A > 0,detyl = 1 
[_ n 

The point is that the quantity (12) is invariant both under 
all shifts and all linear transforms of R". In particular, it is 
constant for the uniform distribution on all ellipsoids, which 
thus minimize (12). Analytically, for any probabiHty density 



j|/||2/"(dcti?)i/" > - / \x\''dx 



-2/ri 



Since r„ is of order ^Jn for the growing dimension n, the 
right side is separated from zero by a universal constant. ■ 
In fact, this proof allows us to compute the optimal 
dimension-free constant. Recall that the volume of the unit ball 
is ijjn = 7r"/^/r(-| + 1). Restricting ourselves for simplicity 
to even dimension n, the optimal dimension-dependent lower 
bound becomes 



'2/71 



1 



n + 2 (§)!-2/" n + 2' 

which by Stirling's approximation is multiplicatively well- 
approximated for large n by 



1 



n{n + 2) 



1 n 
27re n + 2 ^ ' 



As n oo through the subsequence of even numbers, 
this quantity converges to c = (27re)^^, which is there- 
fore the optimal dimension-free constant. Observe that when 
Corollary III. 2 is written with this dimension-free constant, 
equality is not attained for any finite dimension n but only 
asymptotically. In Section V, we give a very simple proof 
of Corollary III. 2 using entropy that also naturally yields the 
exact dimension-free constant. 

IV. Renyi entropies of log-concave distributions 

Recall the definition of the Renyi entropy of order p: for 
p > 1, and a random vector X in R" with density /, 



where 



hp{X) = 



l/ll: 



P 



p-l 



loe 



1 



Fdx 



i/p 



is the usual L^-norm with respect to Lebesgue measure on 
R". By continuity, hp{X) reduces to the Shannon differential 
entropy h{X) as p — >■ 1, and to hoo{X) = log ||/||^^ as p — > 
oo. The definition of hp(X) continues to make sense for p E 
(0, 1) even though ||/j|p is then not a norm. 

Theorem IV.l. Fix p e (1, oo). If a random vector X in R" 
has density f, then 



1 



hp{X) > log II /I 



-l/n 



with equality if and only if X has the uniform distribution on 
any set of positive finite Lebesgue measure. If in addition, f 
is log-concave, then 



-hp{X) < 



logp + log ll/l 



-l/r 



with equality for the n- dimensional exponential distribution, 
concentrated on the positive orthant with density f{x) — 

^-(x, + -+x^)^ > 0. 

Proof: The lower bound is trivial and holds without any 
assumption on the density. 
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Let us derive the upper bound for p > 1. By definition of 
log-concavity, for any x, y e M", 

fitx + sy) > J{xY f{yY, t,s>Q,t + s = 1. (13) 

Taking the <-th root yields 

f{tx + syf">f{x)f{yY'K 

Integrating with respect to x and using the assumption that 
/ / = 1, we get 

i-"/ f{xY"dx>!{yy'\ 
It remains to optimize over y's, so that 

j f{xY'Ux>e\\f\\''K 

Taking p=l/t implies / F > or 

ll/ll;'<p"/ni/ll'^, 

so that 

^(^)<^log[p"/^||/|l'^] 

= ^iogp+iog!i/iri. 

It is easy to check that a product of exponentials is an instance 
of equality. ■ 

Thus a maximizer of the Renyi entropy of order p under 
a log-concavity shape constraint and a supremum norm con- 
straint is the exponential distribution, irrespective of p. This 
is not the only maximizer- indeed, affine transforms with 
determinant 1 of an exponentially distributed random vector 
will also work. Let us remark that if one instead imposes 
a variance constraint, the maximizers of Renyi entropy are 
Student's distributions as shown by Costa, Hero and Vignat 
[22], which specialize to the Gaussian for p = \. (See also 
Johnson and Vignat [31] and Lutwak, Yang and Zhang [36], 
[37] for additional related results.) 

We may now prove some of the results stated in Section I. 
Proof of Proposition 1.2: Note that Proposition 1.2 is 
just a limiting version of Theorem IV. 1, obtained by letting 
p I 1. However, it is not automatic, since there exist densities 
such that hp{X) ~ oo for every p > 1 but h{X) < oo. (An 
example of such a density is 

a;log (1/.t) 

where c is a normalizing constant.) Note that by L'Hopital's 
rule, what one needs to show is that 

exists and equals h{X). This calls for three limit interchanges, 
each of which can be justified by the Lebesgue dominated 
convergence theorem if hp{X) is finite for p E (1,2]. In 
our context of log-concave densities, this is always the case 
because of Theorem IV. 1 and the boundedness of log-concave 
densities. Alternatively, a direct proof of Proposition 1.2 can 
be given similar to that of Theorem IV. 1 by integrating (13) 



with respect to x, maximizing over y, and then comparing 
derivatives in i at t = 1. ■ 
Proof of Theorem I.l: To see the relationship with the 
Gaussian, simply observe that the maximum density of the 
A^(0, cr^/) distribution is (27ro'^)~"/^. (Here as usual, we use 
N{iJL, E) to denote the Gaussian distribution with mean /i and 
covariance matrix S.) Thus matching the maximum density of 
/ and the isotropic normal Z leads to (27r(T^)^/^ = 
and 

-h{Z) = ilog(27rea2) = i + log ||/|ri/". 
n ^ ^ 

This completes the proof of Theorem I.l. ■ 

Theorem IV. 1 also implies that for log-concave random 

vectors, Renyi entropies of orders p and q are related for any 

> 1- 

Corollary IV.2. If X has a log-concave distribution on M", 
and p,q (z [1, oo], then 

hpjX) ^ logp ^ hgjX) 
n ~ p — 1 n 

Since Theorem IV. 1 is just the special case (7 = oo of Corol- 
lary IV.2, the two statements are mathematically equivalent. 

While the preceding discussion relies heavily on the value 
of the density at the mode, one can also extract information 
based on the value at the mean. Let g : R" — > [0,oo) be a 
log-concave function such that J g G (0, 00). Let Xmean be the 
barycenter or mean of g. Then it was shown by Fradelizi [25] 
that 

sup g{x) < e"5(a;n,ean)- 

Combining Proposition 1.2 with Fradelizi's lemma immedi- 
ately yields the following corollary. 

Corollary IV.3. If a random vector X has log-concave density 
f, with mean x„,eaii and mode x„ode, then 

h{X) € [log/(a;,„«„,)"^ - n,\ogf{x,node)~^ + n]. 

V. An Entropic Formulation of the Slicing 
Problem 

The main observation of this section is a relationship 
between the entropy distance to Gaussianity D{f) and the 
isotropic constant Lf for densities of convex measures. 

For a random vector X with density / on R", the relative 
entropy from Gaussianity D{J) or D{X) is defined by 

/ f{x)\og^dx, 

where g is the density of the Gaussian distribution with the 
same mean and the same covariance matrix as X. If Z has 
density g, then one may write D{X) = h{Z) — h(X) (see, 
e.g.. Cover and Thomas [24]). 

For any probability density function / on M" with covari- 
ance matrix R, define its isotropic constant Lf hy 

L} = ||/||2/«dct"(i?). 

The isotropic constant has a nice interpretation for uniform 
distributions on convex sets K. If one rescales K (by a linear 
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transformation) so that the volume of the convex set is 1 and 
the covariance matrix is a muhiple of the identity, then := 
L^j is the value of the multiple. 

Observe that both D{f) and Lf aie affine invariants. The 
following result relating them may be viewed as an alternative 
form of Theorem 1.1 relevant to matching first and second 
moments rather than the supremum norm. 

Theorem V.l. For any density f on M", 
1 



-Dif) < log[ 
1 1 

with equality if and only if f is the uniform density on some 
set of positive, finite Lebesgue measure. If f is a log-concave 
density on K", then 



log 



27r 



with equality if f is a product of one-dimensional exponential 
densities. 



Proof Let X 

N{0,R), 



f have covariance matrix R. If Z 



h{Z) = ilog[(2^e)"det(i?)] = -\og{Ca^), 
where cr^ = dct(_R)" and C = 27re. Thus 

n n 



and 



-D{X) 



< ilog(Ca2)- log 11/11 
= ilog[Ca2||/f/"] 

h{Z) - h{X) 



> ilog(Ca2)- log 11/11 



5 log 



i2/r, 



loa 



27r o 
— 



where the inequalities come from Proposition 1.2. ■ 
Note that this immediately gives an extremely simple al- 
ternate proof of Corollary 111.2. Indeed, since D{f) > 0, we 
trivially have 



V27reL/ > 1, 

which is Corollary III. 2 with the optimal dimension-free 
constant. 

On the other hand, whether or not the isotropic constant 
is bounded from above by a universal constant for the class 
of uniform distributions on convex bodies is an open problem 
that has attracted a lot of attention in the last 20 years. It was 
originally raised by J. Bourgain [19] in (a slight variation of) 
the following form. 

Conjecture V.2. ASlicing Problem or Hyperplane 
Conjecture/ There exists a universal, positive constant c 
(not depending on n) such that for any convex set K of unit 
volume in M", there exists a hyperplane H such that the 
in — 1) -dimensional volume of the section KnH is bounded 
below by c. 



There are several equivalent formulations of the conjecture, 
all of a geometric or functional analytic flavor. Whereas 
Bourgain [iv] and Milman and Pajor [42] looked at aspects 
of the conjecture in the setting of centrally symmetric, convex 
bodies, a popular formulation developed by Ball [6] is that the 
isotropic constant of a log-concave measure in any Euclidean 
space is bounded above by a universal constant independent 
of dimension. Connections of this question with slices of k- 
concave measures are described in [i2]. 

We will now demonstrate that the hyperplane conjecture 
has a formulation in purely information-theoretic terms. It is 
useful to start by mentioning the following equivalences. 

Corollary V.3. Let c{n) be any non-decreasing sequence, and 
c'{n) = c{ri) + i log(27re). Then the following statements are 
equivalent: 

(i) For any log-concave density / on M", Lf < e"^^"-*. 

(ii) For any log-concave density / on M", D{f) < nc'{n). 

(iii) supy mirig D(/||g) < nc'{n), where the minimum is 
taken over all Gaussian densities on M", and the maxi- 
mum is taken over all log-concave densities on R". 

Proof: The equivalence of (i) and (ii) follows from 
Theorem V.l, and that of (ii) and (iii) follows from the easily 
verified fact that D{f) = miiig Z)(/|| 5), where g is allowed 
to run over all Gaussian distributions. ■ 
Furthermore, the seminal paper of Hensley [28] (cf. Milman 
and Pajor [42]) showed that for an isotropic convex body K, 
and any hyperplane H passing through its barycenter, 

ci < LKVol„_i(A'ni7) < C2, 

where C2 > ci > are universal constants. Hence the 
statements of Corollary V.3, when restricted to uniform dis- 
tributions on convex sets, are also equivalent to the statement 
that 



Vol„_i(A'n ff) > e 



-c(n) 



Thus the slicing problem or the hyperplane conjecture is 
simply the conjecture that c{n) can be taken to be constant 
(independent of n), in any of the statements of Corollary V.3. 

Conjecture V.4 (Entropic Form of Hyperplane Con- 
jecture). For any log-concave density / on R" and some 
universal constant c. 



D{f) 



< c. 



This gives a pleasing formulation of the slicing problem as a 
statement about the (dimension-free) closeness of an arbitrary 
log-concave measure to a Gaussian measure. 

Let us give another entropic formulation as a statement 
about the (dimension-free) closeness of an arbitrary log- 
concave measure to a product measure. If / is an arbitrary 
density on M" and fi denotes the i-th marginal of /, set 

/(./) = ^(/ll/i®./2®...®/„); 

this is the "distance from independence", or the relative 
entropy of / from the distribution of the random vector 
that has the same one-dimensional marginals as / but has 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 



7 



independent components. (For n = 2, this reduces to the 
mutual information.) 

Conjecture V.5 (Second Entropic Form of Hyperplane 
Conjecture). For any log-concave density f on M" and 
some universal constant c. 



n 

Proof of equivalence of Conjectures V.4 and V.5: The 
following identity is often used in information theory: if / 
is an arbitrary density on R" and f^^^^ is the density of some 
product distribution (i.e., of a random vector with independent 
components), then 

n 

^(/ll/o) = E^(/'ll/*"^) + ^(/)' (14) 

where fi and denote the i-th marginals of / and /'^^^ 
respectively. 

Now Conjecture V.4 is equivalent to its restriction to those 
log-concave measures with zero mean and identity covariance 
(since D{f) is an affine invariant). Applying the identity (14) 
to such measures, 

n 

D{f) = j2m) + iif), 

4=1 

since the standard normal is a product measure. The lower 
bound of Proposition 11.1 asserts that h{X) > C + logcr for 
one-dimensional log-concave distributions; thus each D{fi) is 
bounded from above by some universal constant. Thus D{f) 
being uniformly 0{n) is equivalent to /(/) being uniformly 
0{n). ■ 
Observe that mimicking Proposition 11.1, Conjecture V.4 
may be written in the form: for a log-concave random vector 
X taking values in M", 

-h{X)>C + \og(T, (15) 
n 

or 

-h{X)>-h{Z)~C', (16) 
n n 

where C, C are universal constants, and Z is the normal with 
the same covariance matrix as X. Owing to (4), the form (15) 
would strengthen the naive lower bound of Proposition 1.2. As 
for form (16), it looks like the lower bound of Theorem 1.1, 
except that the way in which the matching Gaussian is chosen 
is to match the covariance matrix rather than the maximum 
density. 

Existing partial results on the slicing problem already give 
insight into the closeness of log-concave measures to Gaussian 
measures. For many years, the best known bound in the slicing 
problem for general bounded convex sets, due to Bourgain [20] 
in the centrally-symmetric case and generalized by Paouris 
[44] to the non-symmetric case, was 

Lk < cni/'*log(n+ 1). 

Recently Klartag [34] removed the logn factor and showed 
that Lk < cn^/^. Using a transference result of Ball [6] from 



convex bodies to log-concave functions, the same bound is 
seen to also apply to Lf, for a general log-concave density /. 
Combining this with Corollary V.3 leads immediately to the 
following result. 

Proposition V.6. There is a universal constant c such that for 
any log-concave density f on R", 

D{f) < -nlogn + cn. 

Note that the property (2) (quantified by Proposition 11.1) 
for a one-dimensional log-concave density / may be rewritten 
in the form < D{f) < c for some constant c. Proposition V.6 
is thus a multidimensional version of the statement (2). 

VI. Convexity of measures 

Convexity properties of probability distributions may be 
expressed in terms of inequalities of the Brunn-Minkowski- 
type. A probability measure ^ on R" is called K-concave, 
where — oo < k < +oo, if it satisfies 

^l{tA + (1 - t)B) > + (1 - t)^l{B)''] (17) 

for all t e (0, 1) and for all Borel measurable sets A,B C R" 
with positive measure. Here tA + {1 — t)B = {tx + (1 — t)y : 
X ^ A,y £ B} stands for the Minkowski sum of the two sets. 
When K — 0, the inequality (17) becomes 

^l{tA + {l-t)B)>^,{Ay^,iBy-\ 

and we arrive at the notion of a log-concave measure, in- 
troduced by Prekopa, cf. [46], [47], [35]. In the absolutely 
continuous case, the log-concavity of a measure is equivalent 
to the log-concavity of its density, as in (1). When k — — oo, 
the right-hand side is understood as mm{ii{A) , ^(B)} . The 
inequality (17) is getting stronger as the parameter k is 
increasing, so in the case k — — oo we obtain the largest class, 
whose members are called convex or hyperbolic probability 
measures. For general k's, the family of K-concave measures 
was introduced and studied by C. Borell [17], [18]. 

A remarkable feature of this family is that many impor- 
tant geometric properties of K-concave measures, like the 
properties expressed in terms of Khinchin and dilation-type 
inequalities, may be controlled by the parameter k, only, and 
in essence do not depend on the dimension n (although the 
dimension may appear in the density description of many bi- 
concave measures). 

A full characterization of K-concave measures was given by 
C. Borell in [17], [18], cf. also [21]. Namely, any K-concave 
probability measure is supported on some (relatively) open 
convex set fl C R" and is absolutely continuous with respect 
to Lebesgue measure on fl. Necessarily, k < l/dim(17), and 
if il has dimension n, we have: 

Proposition VI.l. An absolutely continuous probability mea- 
sure fi on R" is K-concave, where ~oo < k < 1/n, if and 
only if fi is supported on an open convex set il C M", where 
it has a positive density f such that, for all t G (0,1) and 
x,y 

f{tx+{l-t)y)> [tf{xr- + {l-t)fiyr-]'^''", (18) 
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where Kr 



Following [3], we call non-negative functions /, satisfying 
(18), K„-concave. Thus, /i is K-concave if and only if / is 
K„-concave. 

If K < 0, one may represent the density in the form / = 
(p~^ with P > n, K — ~l/{f3 — n), where ip is an arbitrary 
positive convex function on fi, satisfying the normalization 
condition J^ip~^ dx = 1. Moreover, the condition /3 > ri + l 
like in Theorem 1.3 corresponds to the range — 1 < k < 0. 

Remark VI. 2. Note that a density of form / = tp^^, where 
ip is an arbitrary positive convex function on fl and /3 < n, 
need not be the density of a convex measure. Indeed, it is not 
unless ip itself can be written as a convex function raised to a 
large enough power. 

Proposition VI. 1 remains to hold without the normalization 
condition = 1. In particular, if / is a positive k„- 

concave function on an open convex set 17 in R", then the 
measure (i/i(.T) = f{x) dx is K-concave, that is, it satisfies 
the Brunn-Minkowski-type inequality (17). For example, the 
Lebesgue measure on M" is i-concave (in which case k„ = 
+oo). 

This sufficient condition will be used in dimension one as 
the following: 

Corollary VI.3. Let a > 0. If u is a positive concave function 
on an interval (a, b) C K, then the measure on (a, b) with 
density u" is -^^^-concave. 

VII. Log-concavity of norms of convex functions 

In order to present the proof of our main result for k- 
concave probability measures (which we will do in Sec- 
tion VIII), we first need to develop some functional-analytic 
preliminaries. 

Given a measurable function / on a measurable set fl C M" 
(of positive measure), we write 

\\f\\p=( [ \ ffdx) '\ -oo<p<+oo. 



For the value p = 0, the above expression may be understood 
as the geometric mean ||/||o = exp /log |/| dx. 

It is easy to see that the function p — > ||/||p is log- 
convex (which is referred to as Lyapunov's inequality). C. 
Borell complemented this general property with the following 
remarkable observation ([16, Theorem 2]). 

Proposition VII.l. If D, is a convex body, and if f is positive 
and concave on D,, then the function 

(p + 1) . . . (p + n] 



P 



l/llf - 



Fdx (19) 



is log-concave for p > 0. 

Here we use the standard binomial coefficients 

Borell's theorem. Proposition VII.l, may formally be gen- 
eralized to the class of K-concave functions / with k > 0, 
since then f ~ ip'^ with concave p, and one may apply the 



log-concavity result (19), as well as the inequality (21) to p. 
However, for the purpose of proving Theorem 1.3, with the 
aim of going beyond log-concave probability measures, we are 
mostly interested in the case where k < 0, when the function 
tp is convex. 

Thus what we require is a version of Proposition VII. 1 for 
convex functions p. The following theorem, proved in [14], 
supplies such a result. 

Theorem VII.2. If p is a positive, convex function on a open 
convex set in M", then the function 



p^c;_iii^ll 



-p _ 
-p 



(p - 1) . . . (p - n) 



p-Pdx (20) 



is log-concave on the half-axis p > n + I. 



It is interesting to note that Borell [ 1 6] obtained a different 
proof of Berwald's inequality [8], which is famous among 
functional analysts, as a consequence of Proposition VII. 1 . 



Proposition VII.3. For < p < q. 



(21) 



Equality is achieved when the normalized norms are constant, 
which corresponds to the linear function f{x) = xi-\- ■ ■ ■ -\-Xn 
on the convex body 



= {a; G R" : .T, > 0, 2:1 + 



<!}• 



Berwald's inequality turns out to have interesting applica- 
tions to information theory as well as convex geometry (see 
[14], [13]). 

VIII. Entropy of k-concave distributions 

In this section, we explain how to use the remarkable 
property of convex functions described by Theorem VII.2 in 
proving Theorem 1.3. 

Proof: (of Theorem 1.3.) Let / = p^^^ be a probability 
density for a random vector X in R" with /3 > n + 1, where 
p is as in Theorem VII.2. Define / to be zero outside fi. As 
is shown in [ ], the density admits a bound 

/(-)<(T^' for all 

with some constant C, so p{x) > c (1 + with some c > 
(depending on p). Hence, the function 



V{p) = log / p^'P dx 
Jn 

is finite and differentiable for p > n with V'{p) — 
— p^P logV3 dx/ p~P dx. In particular. 



h{X)^-(3V'il3). 



(22) 



To proceed, assume ||/|| = 1, that is, info <p = 1- This 
assumption can be made since the quantity of interest in 
Theorem 1.3, h{X)-i-\og \\f\\ is an affine invariant; so one can 
scale X to make || /|| = 1. Like in the proof of Theorem IV. 1, 
for t e [0, 1], write 



f(tx + (1 - t)y) > + (1 - t)fiyy'^ 
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which is valid for any x,y G Vt. Integrating with respect to x 
over the whole space, we get 



> 



dx 



with equality at < = 1. Hence, we may compare derivatives of 
both sides with respect to t at this point, which gives 



f{y) 



-1//3 



dx. 



Optimizing over all y's and using J f dx = 1, we arrive at the 
bound 



n>/3( 1- / 
In 



-^dx 



(23) 



Note that for the Pareto distribution there is equality at every 
step made before (since in that case Lp is affine). 

Now, it is time to apply Theorem VII. 2. Put U{p) ~ 
log [{p - 1) ... (p - n)], so that R{p) = U{p) + V{p) is 
concave on the half-axis p > n + 1. The concavity implies 
that 

R'{[i)>R{li + l)-R{l}). 

Equivalently, since V{l3) = and [/(/3+l)-C/(/3) = log 
we have 



V'{P)>V{p + l)+\oi 



- U'{(3). 



(24) 



But (23) is telling us that V{l3 + 1) > log(l - ^), so by (24), 

n 

n/3)>-c/'(/3) = -E«— • 

With the representation (22) we arrive at the bound (8), 

n 

i=l ^ 

From Lemma A.2, this is recognized as the entropy of the 
n-dimensional Pareto density (6), and hence Theorem 1.3 
describes an extremal property of the Pareto distribution. ■ 

In fact, as done for log-concave distributions in Section IV, 
it is possible obtain analogous bounds for the Renyi entropy 
of any order We only state the result here; it is proved in [14]. 

Theorem VIII.l. Fix p e (1, oo). If a random vector X in R" 
has density f and a K-concave distribution for — 1 < n < 0, 
then 



1 



< -hp{X)-\og 11/11 



< 



■log 



{(ip-l)...{pp-n) 



(/3 - 1) ...(/?- n) 



p-l 

where /3 = n + -jp^rj- 

Consequently one has an extension of Corollary IV. 2 to the 
convex measure case: for X as in Theorem VIII.l and any 

I < p < q < oo, 

hp{X) h,{X) 



< 



< 



1 " 



To conclude this section, we show how Theorem 1.3 implies 
Corollary 1.4. 

Proof: (of Corollary 1.4.) For Corollary 1.4, observe that 
in the regime P > (3on, 

P" 
n 



<Po- 



i=l " Pon 4=1 

n Po 



Pon - 71 ^0 - 1 ' 
On the other hand, in the regime P > Pq + n, 

Po + n^ 1 



n P - i 



E 



< 



Po+n 



Po + n - i 
1 



Po + n-1 



log 



< 



n — 1 



1 + ^ UOK ( 1 



Po + n-1 

Po 
n-l 



Po 



as long as Po > 0. This gives an explicit bound for the 
O(logn) term in the second part of Corollary 1.4. ■ 

IX. Applications 

A. Entropy rates 

Our first application is to the entropy rate of (strongly) 
stationary log-concave random processes. We call a discrete- 
time stochastic process X = (Xi) log-concave if all its 
finite-dimensional marginals are log-concave distributions. In 
particular, for the process X to be log-concave, it is necessary 
and sufficient for the distribution of X" = (Xi, . . . ,X„) to 
be log-concave for each n. Note that an important special case 
of a log-concave process is a Gaussian process. 

An important functional of a discrete-time stochastic pro- 
cess is its entropy rate, which is defined by 



/i(X) = lim 



h{X''' 



when the limit exists. 

The only class of processes for which the computation of 
entropy rate is tractable is the class of stationary Gaussian 
processes. Indeed, a stationary zero mean Gaussian random 
process is completely described by its mean correlation func- 
tion Tfcj = rk-j = E[XkXj] or, equivalently, by its power 
spectral density function G, the Fourier transform of the 
covariance function: 



G(A) 



For a fixed positive integer n, the probability density function 
of X" is the normal density with n x n covariance matrix 
Rn, whose entries are Vkj = rk-j, and its entropy can be 
explicitly written. This yields 

/i(X) i log(27re) + lim i logdct(i?„). 

Since i?„ is the Toeplitz matrix generated by the power 
spectral density G (or equivalently by the coefficients {r„}). 
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one has from the theory of ToepUtz matrices (see, e.g., (1.11) 
in Gray [26]) that 



MX) = ilog(2^e) + ^ ^^og 



G{X)dX. 



Below we point out that our inequalities give a way of obtain- 
ing some information about the entropy rate of a stationary 
log-concave process. 

Corollary IX.l. For any stationary process X whose finite 
dimensional marginals are absolutely continuous with respect 
to Lebesgue measure, let /„ be the joint density of X". If 

/_ lim inf - log ||/,i|r^ > -oo, 

n— >oo Ji 

then the entropy rate /i(X) exists and /i(X) > ~oo. If 
furthermore, X is a log-concave process and 

/+ := lim sup -log ||/„|r^ < +oo, 



then 



h{X.) < f+ + 1. 



Proof: Let h{X\Y) denote conditional entropy. As is well 
known (see, e.g.. Cover and Thomas [24]), for any stationary 



h{X^\X 



n-l\ 



IS a non- 



process X, the sequence a„ 
increasing sequence, since 

/l(X„|X"-l) < MX„|X2,...,X„_i) 

^h{Xn^i\X^-^). 

(Here one uses the fact that conditioning cannot increase 
entropy, and the assumed stationarity.) Note also that 



n n ^-^ 



Since 



lim vaihn > lim inf — log ||/„ 



f- > -oo, 



we must have lim inf„_^oo an > —oo, which combined with 
the monotonicity of a„ implies that the limit exists and is 
equal to some a > —oo. Hence the limit of 6„, namely the 
entropy rate, also exists and is equal to a. The upper bound 
for the entropy rate follows from Proposition 1.2. ■ 

One interesting class of processes where this result may be 
of utility, and where the study of entropy rate has attracted 
much recent interest, is the class of hidden Markov processes. 

Let us also note that reasoning similar to that in Corol- 
lary IX.l can be applied to bound the entropy rate of 
continuous-time stationary log-concave processes as well 
(modulo some additional technicalities). 

B. The behavior of maximum density on convolution 

Our Proposition 1.2 can be used to significantly generalize 
and improve an inequality of Junge [32] for the behavior of 
the maximum of a density on convolution. 



Corollary IX.l. Let f be the density of a n-concave measure 
on M", where k G [—1, 0]. Then, for any m S N, 



l/ll- 



Proof: We wish to apply Corollary 1.4, which requires 
/3 > max{/3o?T-, n + 1} for some /3o > 1- Since / is the density 
of a K-concave measure, it is of the form (p^^ with ip convex, 
with k{/3 — n) = —1 (see Section VI). Thus the optimal 
(dimension-dependent!) /3o that can be chosen in applying 
Corollary 1.4 is given by 



> 



c{l,n(/3o-l)} 



min < 1 , 



n{/3o - 1) 



in other words, one may take /3o = 1 + {-ku) ^, for which 
one has Cp^ = ^g^ry = 1 — nn. Now if Xi ^ f are i.i.d., and 



1 then 



TL 

> h(Xi) + -\ogm 

> logll/r^ + ^logm, 

by using Corollary 1.4, the Shannon-Stam entropy power 
inequality [49], and the first part of Proposition 1.2. Expo- 
nentiating yields the desired result. ■ 
In particular, for a log-concave density / on M", 



< 



^1 ll^ll- 



While Junge [32] proved that for a symmetric, log-concave 
density /, 



\fn\ < ( ^ ) 



(25) 



for some universal constant c. Corollary IX. 2 above general- 
izes this by removing the symmetry assumption, making the 
universal constant explicit, and slightly broadening the class 
of densities allowed; also the proof is far more elementary. 

Let us observe that in the three inequalities in the proof of 
Corollary IX. 2, one is tight only for uniforms on convex sets, 
another only for Gaussians, and the third only for Pareto-type 
distributions; so Corollary IX. 2 is always loose, although it 
is possible that c = e could be the optimal dimension-free 
constant in (25). 

C. Infinitely divisible distributions 

Our third application is to estimating the entropy of certain 
infinitely divisible distributions. Let X be a random vector in 
M" with density f{x) and characteristic function 

ip{t) = Ee'<^'*> = J e*<^'*> fix)dx. 

Recall that the distribution of X is infinitely divisible if X can 
be realized as the sum of M independent random vectors, for 
any natural number Af . The Levy-Khintchine representation 
theorem [48], [1] asserts that the distribution of X is infinitely 
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divisible if and only if there exist a symmetric nonnegative- 
definite n x ji matrix S, 7 e K" and a Levy measure 1/ such 
that the characteristic function (p{t) of X is given by 



(p(t)=exp<{-i(Si,t)+i(7,t) 



+ 



i{x.t) 
1 + bp 



i^(da;) ^ 



for each t £ R". Here, a measure on M" is called a Levy 
measure if it satisfies z^({0}) = and /jj„(l A | 
00. The triplet (S],z^, 7) is called the Levy-Khintchine triplet 
of X. We write /^?(E,^/, 7) for the distribution of X, and 
use the abbreviation ID{y) := ID{0,i',Q). Fixing 7 = 
is just fixing a location parameter and does not matter for 
the entropy, whereas fixing S = means that the infinitely 
divisible measure has no Gaussian part. 
We start with a one-dimensional result. 

Corollary IX.3. Let v he any log-concave Levy measure 
supported on (0, 00). Assume that the density m{x) of v 
satisfies m(0+) > 1. Then if f is the density of ID{v), 

h{ID{^l))<l-\og\\J\\. 

Proof: Yamazato [50] (see also Hansen [27] for an 
alternative proof) showed that for infinitely divisible measures 
supported on the positive real line, if the Levy measure has a 
log-concave density m, then the density of the ID measure is 
log-concave if and only if m(0+) > 1. ■ 

It is natural to ask how to bound the entropy h{X) in terms 
of Lp, especially when / is not given explicitly but ip is (which 
is typical in the case of infinitely divisible distributions). 
We show below that some explicit bounds may be given 
when we know something about convexity properties of the 
density. The idea is to utilize the Renyi entropy of order 2, 
since it is directly connected to the characteristic function by 
Plancherel's identity. 

To start with, assume / is log-concave. Then by applying 
Corollary IV. 2 with p = 1 and q ~ 2, one obtains 



log||/||2< h{X)< n-log||/| 



since by definition, h2{X) — — log ||/||2- Note that the lower 
bound here is universally true (for all random vectors X) as a 
consequence of Jensen's inequality; indeed, if X has density 

h{X)=E[- \ogf{X)] >^\ogEf{X) = h2{X). 

But Plancherel's formula asserts that ||/||2 = (27r)^" \\f\\2- 
Hence: 

Proposition IX.4. Let X be a log-concave random vector in 
R" with characteristic function ip{t). Then 

nlog(27r) - log Ml < h{X) < nlog(2^e) - log H^H^ 



where 



Equivalently, 

log(2^) < ^h{X) + y\\l^'' < log(27re). 

This gives a reasonably strong approximation for the entropy 
of a log-concave distribution that is only known through its 
characteristic function: the gap between the upper and lower 
bounds is just 1. 

One would also hope to be able to bound the entropies of the 
non-normal stable laws (which are not log-concave). As a step 
in this direction, we have a generalization of Proposition IX.4 
to the K-concave case. 

Theorem IX.5. If X has a n-concave distribution. 



h2{X) < h{X) < h2{X)+PJ2^ 



1 



(26) 



i=l 



provided /3 ^ n — - > n + 1. 



The upper bound is easy to see using Theorem 1.3 (more 
precisely, inequality (8)) and the first part of Theorem IV. 1; 
the lower bound is as for Proposition IX.4. 

Observe that h2{X) can be explicitly computed in many 
interesting cases via Plancherel's formula. For instance, for 
one-dimensional symmetric a-stable measures with character- 
istic function 



V.(i)=exp(-|tr), 



one obtains 



I<^ll2 



2 / cxp(-2r)dt 
Jo 



10 
" Jo 



For the sake of illustration, let us apply our inequalities to 
approximating the entropy of the Cauchy distribution, which 
is also explicitly computable. Recall that the standard Cauchy 
distribution (stable index a = 1, skewness parameter /3 = 0) 
has density 



7r(l + a;2) 



,x e 



and entropy log(47r) « 1.386 + log tt. It is easy to check that 
the Cauchy distribution is — 1-concave (i.e., one can choose 
/3 ~ 2), so applying Theorem 1.3 gives 

h{X) < log7r + 2 

since = I/tt. On the other hand, h{X) > h2{X) = 
- log[(27r)"ir(l)] = log(27r), so that h{X) is trapped in a 
range of width 2 — log 2 k, 1.307 centered at approximately 
1.347 + logTT, which seems fairly good. 

For multivariate symmetric a-stable probability measures, 
a representation of the Renyi entropy of order 2 is obtained 
by Molchanov [43], in terms of the volume of a star body 
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associated with the measure. In particular, [43] uses the 
identity 



f{x)'dx = 2-"/"/(0) 



for any symmetric a-stable density /; as pointed out by a 
reviewer, the left side here is just the value of the self- 
convolution of / at 0, and for X, X' drawn independently from 
/, stability implies that X + X' has the same distribution as 
+ It is well known that symmetric stable random 

vectors are unimodal with mode at (see, e.g., Kanter [33]); 
hence one can rewrite this as 

i/.2(^) = iogii/iri/" + -iog2. 

n a 
However, this still does not seem as useful as using 
Plancherel's formula to connect with the characteristic func- 
tion. 

It seems plausible that large classes of infinitely divisible 
distributions are K-concave, although we do not know any 
existing general results in this direction (other than the log- 
concavity results mentioned earlier). If one were able to get 
estimates on n for an infinitely divisible distribution that 
is specified through its characteristic function, (26) would 
immediately yield an upper bound for entropy in terms of 

ll'^lli- 

Some negative results on K-concavity of stable laws actu- 
ally follow from the preceding discussion. Indeed, if X is 
symmetric a-stable and K-concave, one has 

log||/ir^/" + -log2==i/i2(X) < -h{X) 
an n 

< log 11/11-1/" + ^ 



n 



<log||/|ri/" + (l-«n), 

where the last inequaUty follows from a similar calculation as 
in the proof of Corollary IX. 2. Thus one obtains 

log 2^ 



K < 1 



1 



This is rather loose, since we already know that for a < 2, 
no symmetric a-stable distribution can be log-concave (as 
it would otherwise have finite moments). However, it does 
give some negative information for a < log 2. For instance, 
it shows that for fixed dimension n, symmetric a-stable 
distributions cease to be K-concave for any fixed k e (— oo, 0) 
as a 0. This leads us to the following conjecture. 

Conjecture IX.6. Any strictly stable probability measure on 
an infinite-dimensional separable Hilbert space is convex. In 
the finite-dimensional case, one has a threshold phenomenon: 
For fixed k < 0, a spherically symmetric stable distribution 
of index a on R" is n-concave if and only if a > a*{K, n), 
where a*{K,n) £ (0,2] is a constant depending only on k 
and n. 

Recall that a random element X in a separable Hilbert 
space is said to have a strictly stable distribution with index 



independent random elements with the same distribution as 
X. 



D. Entropy of Mixtures 

Our fourth application is focused on estimating the en- 
tropy of scale mixtures of Gaussians (or more generally log- 
concave distributions). Such distributions are of great interest 
in Bayesian statistics. 

Suppose one starts with a log-concave density / = e"''', 
where (p is convex. A scale mixture using a mixing distribution 
with density m on the positive real line would have the density 



' , 1 J /X 
m[sj— exp s — ^ 



ds. 



More generally, one can consider "multivariate scale mixtures" 
of form 



/mix(^) 



P{r, 



where 



miA)fAix)r){dA), 



f{A-'x) 



det{A) 

is the density of AX when X is distributed according to /, 
and rj represents the restriction of the Haar measure on the 
general linear group GL{n) equipped with the multiplicative 
operation to the subset P{n) (which is both a semigroup with 
respect to matrix multiplication and a cone, but not a group) 
of positive-definite matrices. 

Note that lower bounds on entropy of mixtures are easy to 
obtain by using concavity of entropy, but upper bounds are in 
general difficult. Indeed, 

h{f:^.)> / m{A)h{fAHdA) 
Jo 

m{A)[h{f) + \ogdct{A)]r]{dA) 



HI) 



m(A)log det{A)T]{dA). 



On the other hand, one has an upper bound under a log- 
concavity assumption. 

Theorem IX.7. Suppose /„„> is a scale mixture of the log- 
concave density f = e~'^, using a mixing distribution with 
density m on the positive-definite cone P{n). Assume f has a 
mode at (which for instance is the case when it is symmetric), 
and that f„,ix is log-concave. Then 

m{A) 



HJmix) <n + ip{0) - log 



-MA). 



a e (0, 2] if Xi 



Xri 



V 



i^/°'X, where the Xi are 



lp{n) dct(A) 

Proof: The proof is obvious from Proposition 1.2 and the 
fact that the mixture density must also have its mode at 0. ■ 

The condition on Theorem IX.7 that the mixture be log- 
concave may not be too onerous to check, at least in the case of 
mixtures involving a one-dimensional scale, i.e., A — si, with 
m now a prior on K_|_. A sufficient condition for /,nix to be log- 
concave is obtained by requiring that the integrand above is 
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log-concave (thanks to the Prekopa-Leindler inequality), which 
means 

ip{x, s) := (y3 ( - 1 - log — — 

is convex. For given m, this may be checked by verifying 
positive-definiteness of the (n + 1) x (n + 1) Hessian matrix 
of (^fora;eM",sG (0,cx)). 

One has an even simpler statement for Gaussian mixtures, 
which already appears to be new. 

Corollary IX.8. Suppose the mixing distribution with density 
m on the positive real line is slightly stronger than log- 
concave, in the sense that j logu— logm(w) is convex. Writing 
Z for a standard Gaussian on R" with density g, let the 
random vector Y = ^/V Z, where V is a scalar distributed 
according to m, have the density g„ux- Then 



m{v)logv dv < h{g„_ 



Kg) 



< 



log 



m(v) 



dv. 



Proof: One can write 

, . _ f°° m{v) 

9 mix ) 



(27rw)"/2 



exp 



2v 



since we are parametrizing using variance (rather than standard 
deviation). Also, ||a;|p/u is convex as a function of {x,v) £ 
R" X (0, oo); indeed, the quadratic form induced by its Hessian 
matrix when evaluated at (a, b) G K" x (0, oo) is |jya— > 
0. Combining this with the assumed log-concavity of ^pr/r- 
one finds that the integrand above is log-concave, and hence 
so is (7niix (by the Prekopa-Leindler inequality). Then the first 
inequality of Corollary IX.8 follows from concavity of entropy, 
while the second follows from Proposition 1.2. ■ 
A limitation (perhaps unavoidable) of this result is that as 
dimension increases, the shape requirement on the prior m 
becomes increasingly stringent. 

X. Discussion 

A central result in our development was the identification 
of the maximizer of Renyi entropy under log-concavity and 
supremum norm constraints. We gave a number of probabilis- 
tic, information theoretic and convex geometric motivations 
for considering this entropy maximization problem. 

There are some other works in which both log-concavity 
and entropy appear, although they are only tangentially related 
to the substance of this paper. Log-concavity plays a role in 
a few other entropy bounding problems- see, for instance. 
Cover and Zhang [23] and Yu [52]. Log-concavity (in the 
discrete sense) also turns out to be relevant to the behavior of 
discrete entropy; see Johnson [ ] and [30] for examples. For 
instance, Johnson [ ] showed that the Poisson is maximum 
entropy among all ultra-log-concave distributions on the non- 
negative integers with fixed mean (ultra-log-concavity is a 
strengthening of discrete log-concavity). 

For completeness, let us also mention that a different maxi- 
mum entropy characterization of one-dimensional generalized 



Pareto distributions was given by Bercher and Vignat [7]. 
However, their characterization is rather different: in particular, 
they use Renyi and Tsallis entropies rather than Shannon 
entropy, and also it is not clear what the motivation is for 
the somewhat artificial moment and normalization constraints 
they impose. While [ , ] claims a connection to the Balkema-de 
Haans-Pickands theorem for limiting distribution of excesses 
over a threshold, log-concavity does not play a role in their 
development. 

Our main goal in this paper was to better understand the be- 
havior of entropy for log-concave (and more generally, hyper- 
bolic) probability measures, particularly as regards phenomena 
that do not degrade in high dimensions. The information- 
theoretic perspective on convex geometry suggested in this 
paper appears to be bearing fruit; for instance, in [13], we 
use some of the results in this paper as one ingredient 
(among several) to prove a "reverse entropy power inequality 
for convex measures" analogous to Milman's reverse Brunn- 
Minkowski inequality [39], [41], [40], [45] for convex bodies. 

We conclude with some open questions. First, the question 
of characterizing the K-concavity properties of infinitely divis- 
ible laws using only knowledge of the characteristic function 
is an interesting one, as discussed in Section IX-C. One also 
hopes that Theorem 1.3 can be improved to only require /3 > n; 
this would immediately imply that many of the results in 
this paper stated for K-concave measures with k > — 1 (or 
their densities) would have extended validity to general convex 
measures. And finally, it would be nice to use the entropic 
formulation of the hyperplane conjecture to improve the state- 
of-the-art partial results that exist. 

Appendix A 
The multivariate Pareto distribution 

There does not seem to be a canonical definition of a 
multivariate version for the Pareto distribution, although vari- 
ous versions appear to have been examined in the actuarial 
literature (see, e.g., [38], [51], [2]). For our purposes, the 
distribution with density fp^a defined in (6) is the relevant 
generalization. In this Appendix, we collect some simple 
observations about this multivariate Pareto family. Recall that 

fl3.a{x) = —4 T (a + Xi H h Xny'\ Xi > 0. 

Lemma A.l. For any a > 0, the normalizing factor 

-^dx 



ZnW,a) 



(a + xi 



+ Xn) 



is finite if and only if /3 > n. Moreover, for /3 > n, 

1 1 

Z„(p,a) 



Proof: We prove the desired statement by induction. First, 



Zi{l3,a)= {a + x)-^dx^ y'^ dy 

Jo J a ^ — P 

OO if /3 < 1 
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Now assume that the statement is true for Zn-i, and observe 
that 



Z„(/3,a) = / dxr, 
/o 



(a + xi + ■ ■ ■ + Xn) dxi 
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1 



{/3-l)...{/3-n+l) (a + a-„)^-"+i 
^ Zi{fi-n+l,a) 



(/3-l)...(/3-7i + l) 
1 1 



{(3-l)...{l3-n) a^*-"' 
which is the required conclusion for Z„. ■ 

In particular, /^.q is not a well defined density for /3 < n, 
and there is no Pareto distribution with such parameters. 

Lemma A.2. For any a > and (3 > n, the entropy of the 
multivariate Pareto distribution fp,a is given by 

n 



Proof: If y ~ fp^a, then 



h{Y)^ log Z^iP, a) + 



/3 



Ln{l3,a), 



where 



Zn{l3,a) 
log(a + xi-\ h a;„) 



dx. 



{a + xi + ■ ■ ■ + Xn)^ 
With this notation, what we wish to prove is that 



Znif3,a) 



n ^ 



1=1 



/3-t 



(27) 



As in the proof of Lemma A. 1 , one can write the recursion 

L„(/3,a)= / Ln-i{f3,a + Xn)dxn, 
Jo 

and it is a simple exercise using integration by parts to see 
that 



Li(/3,a) = Zi(/3,a) 



1 



13-1 



+ log- a 



(28) 



Our goal is to prove the identity (27) by induction. To this 
end, we compute using the induction hypothesis for n — 1: 



Ln{f3,a) = / Zn-i{P,a + y) 
Jo 



Tl-1 



\og{a + y) + Y, J— 
1 r \og{a + y) 

(/3-l)...(/3-n + l)io (a + y)/3-"+i ^ 



n— 1 ^ 

Z„_i(/3,a)^ — 



Recognizing the integral in the last expression as Li{f3 — n + 
I, a) and plugging in the evaluation (28), simple manipulations 
give us (27). Observing that ||/^,a||^^ = ^n(/3, a)a'^, the proof 
of Lemma A. 2 is complete. ■ 
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